Collective Entity Disambiguation with Structured Gradient Tree Boosting
نویسندگان
چکیده
We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many topperforming natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1% absolute accuracy on the popular AIDA-CoNLL dataset.
منابع مشابه
Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values
Gradient boosting of regression trees is a competitive procedure for learning predictive models of continuous data that fits the data with an additive non-parametric model. The classic version of gradient boosting assumes that the data is independent and identically distributed. However, relational data with interdependent, linked instances is now common and the dependencies in such data can be...
متن کاملPair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All
Collective entity disambiguation, or collective entity linking aims to jointly resolve multiple mentions by linking them to their associated entities in a knowledge base. Previous works largely based on the underlying assumption that entities within the same document are highly related. However, the extend to which these mentioned entities are actually connected in reality is rarely studied and...
متن کاملGradient Boosting for Conditional Random Fields
Gradient Boosting for Conditional Random Fields Report Title In this paper, we present a gradient boosting algorithm for tree-shaped conditional random fields (CRF). Conditional random fields are an important class of models for accurate structured prediction, but effective design of the feature functions is a major challenge when applying CRF models to real world data. Gradient boosting, which...
متن کاملEntity Disambiguation using Freebase and Wikipedia
This thesis addresses the problem of entity disambiguation, which involves identifying important phrases in a given text and linking them to the appropriate entities they refer to. For this work, information extracted from both Freebase and Wikipedia served as the knowledge base. A fully functional entity disambiguation tool is made available online and the challenges involved in each stages of...
متن کاملTree-Structured Boosting: Connections Between Gradient Boosted Stumps and Full Decision Trees
José Marcio Luna 1 Eric Eaton 2 Lyle H. Ungar 2 Eric Diffenderfer 1 Shane T. Jensen 3 Efstathios D. Gennatas 4 Mateo Wirth 3 Charles B. Simone II 5 Timothy D. Solberg 4 Gilmer Valdes 4 1 Dept. of Radiation Oncology, University of Pennsylvania {Jose.Luna,Eric.Diffenderfer}@uphs.upenn.edu 2 Dept. of Computer and Information Science, University of Pennsylvania {eeaton,ungar}@cis.upenn.edu 3 Dept. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.10229 شماره
صفحات -
تاریخ انتشار 2018